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Abstract 

The liver is the largest internal organ of the human body. It is responsible for conversion of food 
intake into useful nutrients and also helps to store them. It is responsible for conversion of toxic 
molecules into harmless particles. But recent studies report significant deaths due to liver diseases. It 
is mainly due to unhealthy diet habits and unhealthy routine of people. In the race of doing work 
people are ignoring their health resulting in abnormal health and affecting the liver significantly. 
Therefore prediction of liver disease with high accuracy and speed is an important concern. The liver 
tissues undergo deformation or abnormalities comparatively slower than other body tissues, so 
detection becomes more difficult. In recent decades, the use of automatic decision making systems 
and tools has found a significant role in the medical field. As the medical field deals with human life, 
by using the knowledge of machine learning, deep learning, artificial intelligence, and big data we 
can help in rapid and appropriate treatment and cure. This will help physicians in making the correct 
decision at the right moment and appropriate procedure. In this regard, this study provides an 
extensive review of the progress of applying Artificial Intelligence in forecasting and detection liver 
diseases and then summarizes related limitations of the studies followed by future research. 
Keywords: Liver Diseases, Machine learning, Data Mining, Deep learning, Artificial Intelligence. 


1. Introduction 

Machine learning (ML) techniques help us to make better decisions and distinguish many diseases 
with accuracy levels. Medical fields produce and collect large volumes of data that can be processed 
using machine learning to improve the efficiency of patient care, and to reduce the time of treatment. 
Machine learning has a vital role in medical science as this field deals with human life and well-being. 
In this dataset a total of 583 records is present, where 416 records are present for liver-disease patients 
and 167 persons are non-liver patients. The data are collected from test samples by studying the 
medical test records of patients from North-East of Andhra Pradesh, India and are available in the 
UCI repository. Out of the 583 records, 441 are male patients and 142 are female patients. 

In this paper, a machine learning method is used to predict liver disease, and to find out the 
performance of prediction accuracy. In this regard and to achieve this aim, a logistic regression 
algorithm is first produced to predict liver disease in its early stage. This helps the model to achieve 
better accuracy in the prediction. In the end, the performance of the proposed algorithm is assessed 
when it applies to a liver database. 


2. Methodology 
The attributes (independent and dependent variables) on which liver disease depends are listed below: 


Age of patients 


Gender Gender of patients 
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Flowchart of our work : 
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Multiple Logistic Regression - Multiple Logistic Regression is a machine learning algorithm used to 
predict a single output which is a binary variable using one or more other variables. It is also used to 
calculate the numerical relationship between those given sets of variables. 


FORMULA OF COEFFICIENT OF MULTIPLE LINEAR REGRESSION: 


(x. — Zx) — Ey) 
(y= x) = 

The model of multiple regression can be represented as : 

Y=at+biXi + boX2+..... + baXn 

Where Y = Dependent Variable ( Dataset ) 

a = Constant Variable 

bı = Coefficient of first independent variable 

b2 = Coefficient of second independent variable 

b3 = Coefficient of third independent variable 

b4 = Coefficient of fourth independent variable 

bs = Coefficient of fifth independent variable 

be = Coefficient of sixth independent variable 

b7 = Coefficient of seventh independent variable 

bs = Coefficient of eighth independent variable 

bo = Coefficient of ninth independent variable 

bio = Coefficient of tenth independent variable 

Xı = Independent Variable (Age) 

X2 = Independent Variable (Gender) 

X3 = Independent Variable (Total Bilirubin) 

X4 = Independent Variable (Direct Bilirubin) 

X5 = Independent Variable (Alkaline Phosphatase) 

Xo = Independent Variable (Alamine_Aminotransferase) 

X7 = Independent Variable (Aspartate Aminotransferase) 

Xs = Independent Variable (Total Proteins) 

Xo = IndependentVariable (Albumin) 

Xı0 = Independent Variable (Albumin and Globulin Ratio) 


b; = 


The logistic regression is presented as: 
Y 


Y = ———__~ 
1 (1+ e7) 
Here, 


Y= Dependent Variable e = Euler's number 


Logistic regression - Logistic regression is a machine learning algorithm used to check and calculate 
the relationship between a dependable variable and one or more independent variables. It is a type of 
regression where a dependable variable is binary. 

ACCURACY: Ratio of the correctly classified subjects to the whole subjects’. Accuracy is a measure 
of prediction. 

PRECISION: Ratio of the correctly positive classified by our program to all positive classified. 
SPECIFICITY: Ratio of the number of correctly negative classified subjects to the total number of 
negatives subjects’ 

SENSITIVITY: Ratio of the number of true positives to the total no. of positives. 

e ACCURACY = (TP+ TN/TP +TN + FP + EN) * 100 

e PRECISION = (TP/FP+ TP) * 100 
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e SPECIFICITY = (TN/ TN + FP) * 100 
e SENSITIVITY = (TP/ TP + EN) * 100 
Observed Conditions — Expected Conditions 


ane oonee 100 — Expected Conditions 
where, 
Observed Conditions = % of (Overall Accuracy) 


(TP + FP) * (TP + FN) + (FN + TN) * (FP + TN) 


Expected Conditions = 


3. Results 


Table.3.1. Results of 10 fold Cross Validation : 


0-58 data as Test Data 


100 


59-117 data as Test Data 


Confusion Matrix: 41 Confusion Matrix: 40 0 
0 0 18 
Accuracy: 100.0 Accuracy: 100.0 
Precision: 100.0 Precision: 100.0 
Recall: 100.0 Recall: 100.0 
Specificity: 100.0 Specificity: 100.0 
118-176 data as Test Data 177-235 data as Test Data 
Confusion Matrix: 49 0 Confusion Matrix: 42 0 
0 9 0 16 
Accuracy: 100.0 Accuracy: 100.0 
Precision: 100.0 Precision: 100.0 
Recall: 100.0 Recall: 100.0 
Specificity: 100.0 Specificity: 100.0 
236-294 data as Test Data 295-353 data as Test Data 
Confusion Matrix: 43 Confusion Matrix: 37 0 
0 15 0 21 
Accuracy: 100.0 Accuracy: 100.0 
Precision: 100.0 Precision: 100.0 
Recall: 100.0 Recall: 100.0 
Specificity: 100.0 Specificity: 100.0 
354-412 data as Test Data 413-471 data as Test Data 
Confusion Matrix: 38 Confusion Matrix: 40 0 
0 0 18 
Accuracy: 100.0 Accuracy: 100.0 
Precision: 100.0 Precision: 100.0 
Recall: 100.0 Recall: 100.0 
Specificity: 100.0 Specificity: 100.0 
472-530 data as Test Data 531-589 data as Test Data 
Confusion Matrix: 40 Confusion Matrix: 44 0 
0 0 14 
Accuracy: 100.0 Accuracy: 100.0 
Precision: 100.0 Precision: 100.0 
Recall: 100.0 Recall: 100.0 
Specificity: 100.0 Specificity: 100.0 
Table.3.2. Accuracy of difference between Actual Data and Calculated Data : 
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Accuracy of 90%Data as Training Data or | 100 
(0.90) 
Accuracy of 80%Data as Training Data or | 100 
(0.80) 
Accuracy of 66%Data as Training Data or | 100 
(0.66) 
Accuracy of 50%Data as Training Data or | 100 
(0.50) 


4. Conclusion 

In this paper a model is proposed where it uses multiple logistic regression for liver disease detection. 
Secondary data is collected and used from the UCI repository to calculate relationships between 
dependent and independent variables. We proceed to find a confusion matrix to compare accuracy 
between actual data and calculated data produced by our model. We then applied 10 - fold cross 
validation to calculate accuracy, precision, specificity, sensitivity and kappa. We calculated the 
confusion matrix for each sub - list. This paper will try to produce a new and improved expert system 
for early detection of liver disease. 
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